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Abstract 



The statistics of edge-localised plasma instabilities (ELMs) in toroidal magnetically con- 
fined fusion plasmas are considered. From first principles, standard experimentally moti- 
vated assumptions are shown to determine a specific probability distribution for the waiting 
times between ELMs: the Weibull distribution. This is confirmed empirically by a statis- 
tically rigorous comparison with a large data set from the Joint European Torus (JET). 
The successful characterisation of ELM waiting times enables future work to progress in 
various ways. Here we present a quantitative classification of ELM types, complementary 
to phenomenological approaches. It also informs us about the nature of ELMing processes, 
such as whether they are random or deterministic. 
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Introduction: Edge localised plasma instabilities (ELMs) [l-4] are almost ubiquitous 
in high performance magnetically confined fusion (MCF) plasmas. Their phenomenological 
properties are correlated with the quality of global energy confinement, and the peak energy 
fluxes onto material surfaces jsS]. Key challenges are to statistically characterise these 
processes sufficiently well that a quantitative distinction between different observed classes of 
ELMs becomes possible, and to relate this classification to the physical processes responsible 
for them. This will provide a test for theoretical models, and is an important step towards 
improved estimates for the distribution of ELM waiting times and sizes, both of which must 
be controlled in reactor-scale MCF plasma experiments. 

ELMs offer a rich and diverse experimental phenomenology jl-8]. There is intense the- 
oretical research on the instabilities that may be responsible for triggering them [9|], but 
few unifying principles have been identified. We will show that widely held experimentally 
motivated assumptions about ELMing require particular statistical characteristics. Specif- 
ically, if one assumes that the likelihood of ELM occurrence increases monotonically with 
time elapsed since the most recent ELM, then the measured distribution of waiting times 
between ELMs should belong to a broad class of probability density functions (pdfs) of 
which the Weibull distribution [10] is a special case. This physical approach contrasts with 
a trial and error search for a function that best fits the data [111 ]. 

To test this conjecture requires the identification and selection of a large representative 
data set, the development and use of a reliable ELM detection algorithm, and a method 
to find and compare the best possible fits between data and any proposed pdf. This will 
provide a rigorous basis for present and future studies. As an application of our analysis, we 
distinguish between type I and type III ELMs in a set of plasmas from the Joint European 
Torus (JET) tokamak 12j, on the basis of ELM waiting time statistics alone. Whereas type 



III ELMs are usually smaller than type I ELMs, typically they are more frequent and the 
plasma's energy confinement is lower. The ELM type is presently determined by the ELM 
frequency's response to heating 2n4|- The physically motivated derivation for our pdf allows 
a clear physical interpretation of our statistical classification. 

Theoretical Background: Consider the sequence and distribution of time intervals 
(waiting times) between ELMs. After an ELM, at t = 0, we discuss the statistical properties 
of the time of the next ELM in terms of two linked functions. We define p(t)dt to be the 
probability that the next ELM is in the time interval (t,t + dt), given that it has not yet 
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occurred at time t. This differs crucially from the pdf of time intervals between ELMs, which 
we denote by P(t), and gives the fraction of inter-ELM time intervals that are between t 
and (t + dt) as P(t)dt. Clearly p(t)dt is a conditional probability which, multiplied by the 
probability that no ELM occurs between t = and t, yields the probability P(t)dt of an 
inter-ELM time interval between t and t + dt. This gives the identity: 



(y)dy (i) 



which allows p(t) to be expressed in terms of P(t). Alternately, Eq. [TJcan be used to show 
that, 



d 







P(t) = --exp{- / p{y)dy\ (2) 



giving P{t) as a function of pit), with J °° P(t)dt = 1. The equivalence of Eqs. [T] and 
[2] can be confirmed by substituting Eq. [2] into Eq. [TJ or by writing Eq. [1] as, p(t) = 
—(d/di) In ^1 — J * P(y)dy S j, and substituting into Eq. [2j 

We adopt the experimentally motivated ansatz that for a short time period t m immedi- 
ately after an ELM, pit) = 0, beyond which it starts to increase. The simplest dimensionless 
representation of this hypothesis is, 

t <t m 

P{t)dt={ ( v0-i (3) 



t J t " - v ™ 

where t sets the time scale. Using Eq. [21 this gives, 

t <t 

P(t)dt= { / N/3-i 



to 



(4) 

to 1 — im 



This is a Weibull distribution [10|. It is specified by two dimensionless parameters (3 and 
oi = t m /t , the time scale being set by t . From a theoretical perspective, the values (3 = 1 
and (3 = 2 deserve special mention. Beyond a possible time delay t m , for (3 = 1, p(t) 
is constant, corresponding to a "memoryless" process in which events occur with equal 
probability independent of time. The transition between pit) being a concave (decreasing 
derivative) and convex (increasing derivative) function is at /3 = 2. As (3 increases, events 
appear increasingly regular. The preceding derivation assumes that events are independent 
and that the process causing them is stationary. 
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Data sets: Eq. H]will provide a good fit to a measured sequence of waiting times when 
the hypothesis represented by Eq. [3] holds. Such distributions have a single maximum, and 
require a macroscopic plasma equilibrium with a quasi-stationary ELMing process. Pdfs 
with additional maxima that are unlikely to have arisen from noise were discarded, as were 
data whose ELM type was uncertain. A search of carbon-wall JET data yielded a selection 
of 70 type I and 15 type III ELM data sets. The data sets each have a steady period of 
ELMy H-mode lasting between 3 and 6 seconds, and plasmas with an energy confinement 
time typically between 0.25 and 0.4 seconds. The data sets are listed in the supplementary 
material (SM) 13] . The need for quasi-stationary ELM statistics is met by the pulse length 
and quality of the JET plasmas studied, which is much improved on the 4 data sets studied 
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ELM detection: ELM detection algorithms typically examine the radiation associated 
with ELMs, using a threshold in amplitude to signal the start of an ELM, and a similar 
threshold or combination of thresholds to determine when an ELM has finished 11] . In those 
respects, our detection algorithm is the same. The advance of the algorithm described here 
is that the thresholds are determined from the data in a precise and statistically invariant 
way, so that we do not need to reset thresholds for different sets of data. This allows 
statistically robust comparisons between different data sets, and enables the technique to 
be used for non-steady-state and real-time situations if desired. Our algorithm examines 
the signal intensity of the Lyman-alpha radiation from Deuterium (D a ) at JET's inner 
divertor, and proceeds in two steps. First a scan is made of the data, obtaining for each 
time point the box-average and standard deviation of the signal intensity for a time interval T 
immediately prior to that point. The average and standard deviation determine a Gaussian 
distribution, that is subsequently used to distinguish ELMs automatically. For this study 
the (D a ) signal threshold for ELM-detection was for signal intensities that would only occur 
one time in twenty, based on the Gaussian distribution obtained from the data preceding 
the measurement in question. Once the signal has fallen below the average again, the ELM 
is considered to have finished. We use a time interval T = 0.41s that is much longer than 
the time between ELMs, but is reasonably short compared with changes to the plasma 
equilibrium. For stationary pulses such as those here, with ELM waiting times t <C T, 
results are unchanged by increasing T to the time duration of the entire dataset. For cases 
such as these, T is independent of the data. Because we are interested in classifying ELMs 
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by their statistical properties, here we chose the same threshold for both the type I and III 
data. The threshold of one in twenty was sufficiently sensitive for type III data, but kept 
noise tolerable in type I data. A systematic exploration of these thresholds will be presented 
elsewhere. 

The method just described provides a non-subjective method to determine when the D a 
signal intensity indicates an ELM. Because the study involves the detection and study of 
many thousands of ELMs, "incorrect" detection or omission of one or more ELMs becomes 
part of the experimental noise. The detection settings require only one value to be set in 
advance of an analysis, and because it does not need to be changed or optimised for any 
given set of data, it is easy and quick to analyse very large data sets. Also because thresholds 
are set independently of the data, it is possible to systematically mine noisy data by varying 
the noise and time-scale parameters to search for patterns in data that would otherwise be 
obscured. 

Best fit & goodness of fit: Both the Weibull and Gaussian distributions have free 
parameters that must be chosen to fit the data. A simple fit is provided by using the 
moments of the data, e.g. average, standard deviation, and skewness, to fit the parameters. 
More rigorously, we can consider the likelihood function for the probability of the data given 
the model being considered [l^] (e.g. the Weibull model, W), and parameters A, with, 

L(X) =P({U}\W,\) (5) 

where P({ti}\W,X) is the probability of observing the set of waiting times {tj}, given the 
assumption of a Weibull distribution (W), with fitting parameters A. The free parameters 



that maximise L(X) are their maximum likelihood (ML) estimate 14j, for which the likeli- 
hood of the data (given the distribution being considered), is a maximum. In practice the 
ML estimates are found by starting from the moment-fitted estimates and iterating to find 
A that maximises L(X). Given the best fits for two distributions Pa and Pb, we can compare 



their goodness of fit by calculating their likelihood ratio 



A{PA > PB) -pm\p B ,\ B ) (6) 

Under the assumption of independent {U}, the likelihood function and likelihood ratio can 
be expanded, with for example, P ({^}|Pa, A^) = n™ =1 P u^Pa, \a)- Whether Pa or P B is 
a better fit to the data is determined by whether A is greater, or less, than 1. 
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FIG. 1: Weibull (blue) and experimental pdfs (black bar chart), for JET pulse no. 57861 (type I 
ELMs). 

Eq. H] has one more free parameter than a Gaussian. Thus although Eq. H] might 
provide a best fit to the data, the model might not be better, because the fit used an extra 
parameter. A Bayesian analysis would introduce an extra factor [ijj] in Eq. [6] to account 
for this. However its influence will reduce, as the number of ELM time intervals increases. 
Unless the factor is of order 1/A it will not affect the decision for which is the best fit. For 
the classification of data, the most important issue is that the pdf (not the model), is a 
good fit. From that perspective the issue is not relevant. Eq. [6] rigorously indicates which 
pdf is the best fit, and for the large number of ELMs in our analyses, Eq. [6] is sufficient to 
determine whether the model is significantly better or worse than a Gaussian. 

An absolute measure of goodness of fit, is provided by dividing the ELM waiting time axis 
into intervals, calculating the fraction Pi of observed ELMs in each interval i, and calculating 
the co-efficient of variation cw = {(Pi — Pw(U)) 2 ) I ' (Pw(ti)) 2 between the observed (Pi) and 
the theoretical (P\y(ti)) values at the midpoint ti of the interval. This gives a normalised 
measure of the difference between the observed and theoretical pdfs, and provides an absolute 
measure for goodness of fit. It has the disadvantage of being dependent upon the number 
of data points used to generate the Pi. Small numbers of points will make cw susceptible to 
noise, increasing its value. The choice of time intervals will also affect cw, and consequently 
affect a fit that minimises cw- With enough data this would no longer be the case, but 
in practice it prevents cw from determining a unique best fit. For these reasons we use a 
maximum likelihood best fit, which is unique. Similarly if c\y is used to determine which 
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FIG. 2: Weibull (blue) and experimental pdfs (black bar chart), for JET shot no. 74417 (type III 
ELMs). 

pdf gives the best fit, the decision is in practice influenced by the choice of time intervals. 
ELM Classification: A full listing of the datasets studied, the time intervals over which 
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they were analysed, and the results from their analysis are presented in the SM 
a dataset with n ELMs, we substitute Eq. H]for Pa and a Gaussian for Pb in Eq. [6j then 
calculate the geometric mean A 1 /™ which will be of order 1. If A 1 /™ is greater (less) than 1.0 
then A will be much larger (smaller) for n> 1, indicating whether the Weibull is a better 
(worse) fit than a Gaussian. For the type I datasets (A 1//n ) = 1.01 ± 0.04, where the error 
of ±0.04 is the standard deviation, and n ~ 100. Using time intervals of 2.5 x 10~ 3 s, the 
coefficient of variation between the fitted and observed pdfs is (cw) = 0.63 ± 0.22 for the 
Weibull best fits, and (cg) = 0.63 ±0.20 for the Gaussian best fits. For the type III datasets 
(A 1 /") = 1.51 ± 0.15, with n ~ 300 or larger, (c w ) = 0.70 ± 0.23, and (c G ) = 1.25 ± 0.24. 
Typical examples are in Figs. |4] and |5j Whereas the fits are similarly good for type I ELMs, 
the Weibull distribution is the clear best fit for type III ELMs. Substantially improved fits 
are likely if outliers are removed by improved data, improved ELM detection techniques, or 
with some algorithm. The values of cw and cq can be reduced if the best fit minimises them 
instead of A. 

Figure HDplots a and (3 for the type I and type III ELM datasets. There is a clear clustering 
of type III data for (3 = 1 and a < 0.5. As noted earlier, (3 = 1 has special significance 
because beyond an initial time delay t m , it corresponds to a "memoryless" process in which 
the probability of an ELM is independent of time. The type I data has a wide spread in 
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FIG. 3: Maximum-likelihood best fits to Eq. U) type I ELM database (black diamonds), type 
III database (blue squares), and some high frequency ELMs (red triangles). Type III data is 
characterised by /? ~ 1, whereas all other data has (3 > 2. 

a and (3, but notably remains of order 2 or larger. As increases, ELMs will appear 
increasingly regular. Therefore the type I ELMs studied here are consistent with a process 
whereby the probability of an ELM increases with time since the previous ELM, possibly due 
to the build-up of some physical quantity with time. The similarly good agreement between 
the Gaussian and Weibull fits allows the alternative interpretation that type I ELMs have 
a specific frequency that is broadened by noise, and that the good fit to type III ELM data 
is coincidental. This is possible, although our original hypothesis is consistent with present 
ELM models, and explains the good fit to both the type I and type III data. To avoid 
disagreement about the classification of ELM types, our dataset excludes ELMs whose type 
is uncertain. Therefore it is possible that there is a continuum between the classifications 
that would not be observed in our data set of typical type I and type III ELMs. 

As an example we analysed JET plasmas 66105-66109, whose ELM frequency is typical 
of type III ELMs 2-^, f|, but whose D a signal is visually similar to that of type I ELMs. 
Based on Fig. [6j they are not type III ELMs. 

Conclusions: We have shown how simple experimentally motivated assumptions require 
a Weibull pdf for inter-ELM waiting times. The model applies to stationary processes. A 
search of JET data yielded 64 sufficiently long and steady plasmas to test the model, details 



of which are in the SM |l_3j . A statistically rigorous ELM detection technique was developed 



to compare the data sets from experiments many years apart. The method uses a single 
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dimensionless threshold that is set independently of the data, and a single time-period, 
allowing rapid objective comparisons between different data sets. The dataset was analysed, 
and a maximum likelihood best fit calculated, finding a good Weibull fit to both type I and 
type III data. Therefore we explored whether the dimensionless fitting co-efficients a and /3 
could be used to classify the data, concluding that they can. The classification has a clear 
interpretation - type III ELMs are consistent with a memoryless process, but type I ELMs 
are consistent with the build-up of a quantity with time, leading to instability. In contrast, 
present ELM classification requires either a subjective judgment, or experimental time to 
determine how ELM frequency responds to heating 2|-|4|. 

To summarise, we have shown that a rigorous statistical analysis of ELM waiting times 
is possible, that it can provide a quantitative classification of ELM types, and physical 
insight into the processes responsible for them. The methods have numerous potential future 
applications, especially for the longer plasma pulses planned for ITER 15j. These include 
data mining, use in real-time and for other signals, and a quantitative characterisation of 
the response of ELM sequences to external parameters. 
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FIG. 4: Weibull (blue) and experimental pdfs (black bar chart), for JET pulse no. 57861 (type I 
ELMs). 
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FIG. 5: Weibull (blue) and experimental pdfs (black bar chart), for JET shot no. 74417 (type III 
ELMs). 
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FIG. 6: Maximum-likelihood best fits to Eq. U) type I ELM database (black diamonds), type 
III database (blue squares), and some high frequency ELMs (red triangles). Type III data is 
characterised by /? ~ 1, whereas all other data has f3 > 2. 
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General Remarks: An objective of this paper was to determine whether a good fit 
to a large variety of data is possible by Eq. 4. Nonetheless, it was regarded essential that 
the sets have approximately constant NBI heating and gas fuelling, and that they have 
approximately constant central density, and energy confinement. As discussed in the main 
text, datasets whose ELM waiting time pdf have two or more clear maxima of comparable 
sizes are not included. Some plasmas also have (approximately constant) ICRH heating 
during the time period analysed. 

In the following tables of data, the first column is the JET pulse number, t\ and t 2 give 
the time at which the time series analysis of D a data started and ended respectively, B T 
is the toroidal field in Tesla, I p is the toroidal plasma current in Mega Amps, the other 
parameters are defined in the main text. 
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Figure 1: Example Dq, time traces are shown for JET pulses (from the top down): 61480 (a = 0.02, 
= 1.8), 72343 (a = 0.0, = 9.2), 67761 (a = 1.2, = 6.0), and 73341 (a = 2.8, = 1.9). 

For illustrative purposes, Figure Q] includes a selection of D Q time traces. The examples 
are chosen from the four extremities of our a-fi plot in Figure 3 of the main text, and are 
shown for the time period of 60s-63s. The 60s-63s time window was chosen because it is 
included in the analysis of all the four pulses shown. From top to bottom in Figure [U or 
clockwise from bottom left in the D Q plot of Figure 3 of the main text, the pulses are: 61480 
(a = 0.02, p = 1.8), 72343 (a = 0.0, = 9.2), 67761 (a = 1.2, = 6.0), and 73341 (a = 2.8, 
= 1.9). 
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"High frequency" type I ELMs 
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